{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NaiveBayes Classifier\n",
    "### Probabilitistic Model\n",
    "- From data $D$ we can **infer** the parameters $\\theta$ of model $M$ \n",
    "\n",
    ">$\\displaystyle p(\\theta \\lvert D) = \\frac{p(\\theta)\\,p(D \\lvert \\theta)}{p(D)}$ \n",
    ">\n",
    "\n",
    "### Likelihood Function\n",
    "- From data $D$ we can **infer** the parameters $\\theta$ of model $M$ \n",
    "\n",
    ">$\\displaystyle p(\\theta \\lvert D) = \\frac{\\pi(\\theta)\\,{\\cal{}L}_D(\\theta)}{Z}$ \n",
    ">\n",
    "> where the normalization\n",
    ">\n",
    ">$\\displaystyle Z = \\int \\pi(\\theta)\\,{\\cal{}L}_D(\\theta)\\ d\\theta $ \n",
    "\n",
    "- The **posterior** is proportional to the **prior** times the **likelihood function** \n",
    "\n",
    "### Naive Bayes Classifier\n",
    "\n",
    "- In general, we can use Bayes' rule (and law of total probability) to infer discrete classes $C_k$ for a given $\\boldsymbol{x}$ set of features\n",
    "\n",
    ">$\\displaystyle P(C_k \\lvert\\,\\boldsymbol{x}) = \\frac{\\pi(C_k)\\,{\\cal{}L}_{\\boldsymbol{x}}(C_k)}{Z} $ \n",
    "\n",
    "\n",
    "- Naively assuming the features are independent \n",
    "\n",
    ">$\\displaystyle {\\cal{}L}_{\\boldsymbol{x}}(C_k) = \\prod_{\\alpha}^d p(x_{\\alpha} \\lvert C_k)$ \n",
    "\n",
    "### Naive Bayes: Estimation\n",
    "\n",
    "- Look for maximum of the posterior\n",
    "\n",
    "\n",
    ">$\\displaystyle \\hat{k} =  \\mathrm{arg}\\max_k \\left[ \\pi_k \\prod_{\\alpha}^d G(x_{\\alpha};\\mu_{k,\\alpha}, \\sigma^2_{k,\\alpha})\\right]$ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn import datasets\n",
    "from math import pi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load in Iris Data\n",
    "iris = datasets.load_iris()\n",
    "X = iris.data[:,:]\n",
    "y = iris.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 [5.006 3.428 1.462 0.246] [0.12424898 0.1436898  0.03015918 0.01110612]\n",
      "1 [5.936 2.77  4.26  1.326] [0.26643265 0.09846939 0.22081633 0.03910612]\n",
      "2 [6.588 2.974 5.552 2.026] [0.40434286 0.10400408 0.30458776 0.07543265]\n"
     ]
    }
   ],
   "source": [
    "# calculate feature means and variances for each class\n",
    "classes = np.unique(y)\n",
    "param = dict()  # we save them in this dictionary\n",
    "for k in classes:\n",
    "    members = (iris.target == k) # boolean array\n",
    "    num = members.sum()    # True:1, False:0\n",
    "    prior = num / float(iris.target.size)\n",
    "    X = iris.data[members,:] # slice out members\n",
    "    mu = X.mean(axis=0)      # calc mean\n",
    "    X -= mu\n",
    "    var = (X*X).sum(axis=0) / (X[:,0].size-1)\n",
    "    param[k] = (num, prior, mu, var) # save results\n",
    "    print (k, mu, var)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of mislabeled points out of a total 150 points : 6\n"
     ]
    }
   ],
   "source": [
    "# init predicted values\n",
    "k_pred = -1 * np.ones(iris.target.size)\n",
    "\n",
    "# evaluate posterior for each point and find maximum\n",
    "for i in range(iris.target.size):\n",
    "    pmax, kmax = -1, None   # initialize to nonsense values\n",
    "    for k in classes:\n",
    "        num, prior, mu, var = param[k]\n",
    "        diff = iris.data[i,:] - mu\n",
    "        d2 = diff*diff / (2*var) \n",
    "        p = prior * np.exp(-d2.sum()) / np.sqrt(np.prod(2*pi*var))\n",
    "        if p > pmax:\n",
    "            pmax = p\n",
    "            kmax = k\n",
    "    k_pred[i] = kmax\n",
    "\n",
    "print(\"Number of mislabeled points out of a total {:d} points : {:d}\".format(iris.target.size, sum(iris.target!=k_pred)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of mislabeled points out of a total 150 points : 6\n"
     ]
    }
   ],
   "source": [
    "# run sklearn's version - read up on differences if interested\n",
    "from sklearn.naive_bayes import GaussianNB\n",
    "\n",
    "gnb = GaussianNB()\n",
    "y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)\n",
    "\n",
    "print(\"Number of mislabeled points out of a total {:d} points : {:d}\".format(iris.target.size, sum(iris.target!=y_pred)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
